Improving Temporal Di erence Learning for Deterministic Sequential Decision Problems

نویسندگان

  • Thomas Ragg
  • Heinrich Braun
چکیده

We present a new connectionist framework for solving deterministic sequential decision problems based on temporal diierence learning and dynamic programming. The framework will be compared to Tesauro's approach of learning a strategy for playing backgammon, a proba-bilistic board game. It will be shown that his approach is not applicable for deterministic games, but simple modiications lead to impressive results. Furthermore we demonstrate that a combination with methods from dynamic programming and the introduction of a re-learning factor can reduce the necessary training time by several orders of magnitude and improve the overall performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analytical Mean Squared Error Curves in Temporal Di erence Learning

We have calculated analytical expressions for how the bias and variance of the estimators provided by various temporal di erence value estimation algorithms change with o ine updates over trials in absorbing Markov chains using lookup table representations. We illustrate classes of learning curve behavior in various chains, and show the manner in which TD is sensitive to the choice of its steps...

متن کامل

An Analysis of Temporal Di erence Learning with Function Approximation

We discuss the temporal di erence learning algorithm as applied to approximating the cost to go function of an in nite horizon discounted Markov chain The algorithm we analyze updates parameters of a linear function approximator on line during a single endless traject ory of an irreducible aperiodic Markov chain with a nite or in nite state space We present a proof of convergence with probabili...

متن کامل

Learning and value function approximation in complex decision processes

In principle, a wide variety of sequential decision problems { ranging from dynamic resource allocation in telecommunication networks to nancial risk management { can be formulated in terms of stochastic control and solved by the algorithms of dynamic programming. Such algorithms compute and store a value function, which evaluates expected future reward as a function of current state. Unfortuna...

متن کامل

Evolutionary Algorithms for Reinforcement Learning

There are two distinct approaches to solving reinforcement learning problems, namely, searching in value function space and searching in policy space. Temporal di erence methods and evolutionary algorithms are well-known examples of these approaches. Kaelbling, Littman and Moore recently provided an informative survey of temporal di erence methods. This article focuses on the application of evo...

متن کامل

Stable Function Approximation in Dynamic Programming

The success of reinforcement learning in practical problems depends on the ability to combine function approximation with temporal di erence methods such as value iteration. Experiments in this area have produced mixed results; there have been both notable successes and notable disappointments. Theory has been scarce, mostly due to the difculty of reasoning about function approximators that gen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995